NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GreediRIS: Scalable influence maximization using distributed streaming maximum cover

https://doi.org/10.1016/j.jpdc.2025.105037

Barik, Reet; Cappa, Wade; Ferdous, SM; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth (April 2025, Journal of Parallel and Distributed Computing)

Full Text Available
FuseIM: Fusing probabilistic traversals for influence maximization on exascale systems

Neff, Reece; Zach, Mostafa; Minutoli, Marco; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanaraman, Ananth; Becchi, Michela (August 2024, ICS 2024)

Full Text Available
Scalable and memory-efficient algorithms for controlling networked epidemic processes using multiplicative weights update method.

https://doi.org/10.24963/ijcai.2022/717

Sambaturu, Prathyush; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Vullikanti, Anil. (July 2023, Proc. International Joint Conference on Artificial Intelligence (IJCAI-ECAI))

Full Text Available
High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization

https://doi.org/10.1145/3587135.3592196

Neff, Reece; Minutoli, Marco; Tumeo, Antonino; Becchi, Michela (May 2023, CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers)

FPGAs are promising platforms for accelerating irregular applications due to their ability to implement highly specialized hardware designs for each kernel. However, the design and implementation of FPGA-accelerated kernels can take several months using hardware design languages. High Level Synthesis (HLS) tools provide fast, high quality results for regular applications, but lack the support to effectively accelerate more irregular, complex workloads. This work analyzes the challenges and benefits of using a commercial state-of-the-art HLS tool and its available optimizations to accelerate graph sampling. We evaluate the resulting designs and their effectiveness when deployed in a state-of-the-art heterogeneous framework that implements the Influence Maximization with Martingales (IMM) algorithm, a complex graph analytics algorithm. We discuss future opportunities for improvement in hardware, HLS tools, and hardware/software co-design methodology to better support complex irregular applications such as IMM.
more » « less
Full Text Available
IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization

https://doi.org/10.1109/HiPC56025.2022.00028

Barik, Reet; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth (December 2022, Proceedings of the International Conference on High Performance Computing, Data, and Analytics (HiPC))

Full Text Available
HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

Chen, Xinyu; Minutoli, Marco; Tian, Jiannan; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Tao, Dingwen (October 2022, The 31st International Conference on Parallel Architectures and Compilation Techniques (PACT 2022))

Influence maximization aims to select k most-influential vertices or seeds in a network, where influence is defined by a given diffusion process. Although computing optimal seed set is NP-Hard, efficient approximation algorithms exist. However, even state-of-the-art parallel implementations are limited by a sampling step that incurs large memory footprints. This in turn limits the problem size reach and approximation quality. In this work, we study the memory footprint of the sampling process collecting reverse reachability information in the IMM (Influence Maximization via Martingales) algorithm over large real-world social networks. We present a memory-efficient optimization approach (called HBMax) based on Ripples, a state-of-the-art multi-threaded parallel influence maximization solution. Our approach, HBMax, uses a portion of the reverse reachable (RR) sets collected by the algorithm to learn the characteristics of the graph. Then, it compresses the intermediate reverse reachability information with Huffman coding or bitmap coding, and queries on the partially decoded data, or directly on the compressed data to preserve the memory savings obtained through compression. Considering a NUMA architecture, we scale up our solution on 64 CPU cores and reduce the memory footprint by up to 82.1% with average 6.3% speedup (encoding overhead is offset by performance gain from memory reduction) without loss of accuracy. For the largest tested graph Twitter7 (with 1.4 billion edges), HBMax achieves 5.9× compression ratio and 2.2× speedup.
more » « less
Full Text Available
HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

https://doi.org/10.1145/3559009.3569647

Chen, Xinyu; Minutoli, Marco; Tian, Jiannan; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Tao, Dingwen (October 2022, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques)

Full Text Available
Scalable and Memory-Efficient Algorithms for Controlling Networked Epidemic Processes Using Multiplicative Weights Update Method

https://doi.org/10.24963/ijcai.2022/717

Sambaturu, Prathyush; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Vullikanti, Anil (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence)

We study the problem of designing scalable algorithms to find effective intervention strategies for controlling stochastic epidemic processes on networks. This is a common problem arising in agent based models for epidemic spread.Previous approaches to this problem focus on either heuristics with no guarantees or approximation algorithms that scale only to networks corresponding to county-sized populations, typically, with less than a million nodes. In particular, the mathematical-programming based approaches need to solve the Linear Program (LP) relaxation of the problem using an LP solver, which restricts the scalability of this approach. In this work, we overcome this restriction by designing an algorithm that adapts the multiplicative weights update (MWU) framework, along with the sample average approximation (SAA) technique, to approximately solve the linear program (LP) relaxation for the problem. To scale this approach further, we provide a memory-efficient algorithm that enables scaling to large networks, corresponding to country-size populations, with over 300 million nodes and 30 billion edges. Furthermore, we show that this approach provides near-optimal solutions to the LP in practice.
more » « less
Full Text Available
Accelerating Random Forest Classification on GPU and FPGA

https://doi.org/10.1145/3545008.3545067

Shah, Milan; Neff, Reece; Wu, Hancheng; Minutoli, Marco; Tumeo, Antonino; Becchi, Michela (August 2022, ICPP '22: Proceedings of the 51st International Conference on Parallel Processing)

Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification. In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library.
more » « less
Full Text Available
Scalable and Memory-Efficient Algorithms for Controlling Networked Epidemic Processes Using Multiplicative Weights Update Method

Sambaturu, Prathyush; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Vullikanti, Anil (January 2022, IJCAI-ECAI AI for Good Track)

Full Text Available

« Prev Next »

Search for: All records